Improve response parsing: no ptypes, faster datetimes#513
Improve response parsing: no ptypes, faster datetimes#513
Conversation
| # Keep only the columns relevant to job termination; the API response | ||
| # includes extra fields (e.g. payload, guid) on error that vary by outcome. | ||
| keep <- c("app_id", "app_guid", "job_key", "job_id", "result", "code", "error") |
There was a problem hiding this comment.
Is it a problem that we get variable data at this point?
There was a problem hiding this comment.
I believe that the variable data is for error cases (i.e. trying to terminate a job that is not currently active, which returns a 409 but doesn't raise an R error) vs. successful requests. I guess it's debatable whether we should be raising an R error more eagerly, but I think the behavior of accommodating different field names is consistent with main.
The one difference here is that main will always return the columns of keep whereas on this branch if any columns from keep are missing across all responses, they'd be omitted. I'll update to make it more consistent with main.
There was a problem hiding this comment.
I was actually thinking a bit in the opposite direction: if we sometimes get different fieldnames that's probably totally ok. This is probably more important (or really, more possible) when we get to having these objects not be DFs that get passed around. Then the normalization of "here are the columns you're getting" can be done at the as.data.frame() point). We don't need to do this here, it just smelled a little funny to me
There was a problem hiding this comment.
Gotcha yeah, if we were returning a list then I agree it'd make more sense to not worry about the columns -- the responses are what they are, and that may include different fields.
| # The older /applications/ endpoint returns timestamps as Unix epoch integers | ||
| # and ID fields as integers. Normalize to match the v1 endpoint's types. | ||
| # For the v1 endpoint these are already character/POSIXct, so the coercions | ||
| # are no-ops. |
There was a problem hiding this comment.
It's probably not worth too much digging, but if you know off the top of your head: how old are older versions here?
There was a problem hiding this comment.
GET /v1/content/{guid}/jobs was added in October 2022 https://docs.posit.co/connect/news/#rstudio-connect-2022.10.0; the old applications endpoint was removed July 2025.
|
|
||
| parse_connectapi_typed(bundles, connectapi_ptypes$bundles) | ||
| out <- parse_connectapi_typed(bundles, datetime_cols = "created_time") | ||
| coerce_fs_bytes(out, "size") |
There was a problem hiding this comment.
We can keep this here, but I do wonder if providing these as fs::byte types is really useful?
| # No shiny apps are deployed in integration tests, so this may be empty. | ||
| if (nrow(shiny_usage_local) > 0) { | ||
| expect_gt(length(colnames(shiny_usage)), 1) | ||
| expect_true("content_guid" %in% names(shiny_usage_local)) | ||
| expect_s3_class(shiny_usage_local$started, "POSIXct") | ||
| } | ||
| }) |
There was a problem hiding this comment.
I assume the if here is because a 0-row df doesn't get coerced to this type? But is that true / is that what we want?
There was a problem hiding this comment.
if there are no shiny apps then the usage data returns an empty response with no rows OR columns, so we can't assert the column names/types
There was a problem hiding this comment.
Ah, hmmm that seems not ideal, but if it's not a regression in this work that's fine. But maybe a follow up would be good?
There was a problem hiding this comment.
It is a regression in this work. Previously we would have been able to construct a 0-row data frame with
the expected columns based on the ptypes. Since we want to remove the ptypes, we now have no way to know what columns to create. The regression here is inherent to the motivation of the PR. I'm not sure what follow-up we could do that wouldn't bring back ptypes in some form.
There was a problem hiding this comment.
Aaah ok, I get it now: the response from the server is a different shape if there are no visits to shiny apps to report, yeah?
I did a bit of poking and it looks like that is also common across other endpoints too (e.g. content search returns an empty list when there are no matches). So that will have a similar issue too.
tests/testthat/test-get.R
Outdated
| id = c("8966707", "8966708", "8967206", "8967210", "8966214"), | ||
| id = c(8966707L, 8966708L, 8967206L, 8967210L, 8966214L), |
There was a problem hiding this comment.
Are these supposed to remain characters? I believe they should (given how we do our IDs), but maybe I'm missing something?
There was a problem hiding this comment.
they changed because now we return whatever the API returns rather than trying to coerce to specific types, and the test fixtures have integers rather than characters. I'll update those but the behavior here is correct, and only seems confusing because these are unit tests that can get out of sync with connect's actual response types.
jonkeane
left a comment
There was a problem hiding this comment.
Sorry I commented outside of a review so that I didn't accidentally leave them without sending. Overall this looks good, and I'm pro getting this on main and using it locally to confirm it works well.
Do you think it would be a good thing to add a test or two that simulates the changes that we had to do with #512 ?
…rned from the connect server
Intent
Fixes #483
Approach
Removes the
connectapi_ptypesdictionary and vctrs-based type coercion system (ensure_columns,ensure_column,vec_cast). Instead, each getter function declares its owndatetime_colsand applies lightweight post-parse coercion viacoerce_datetime(), which handles RFC 3339 strings, epoch seconds, POSIXct pass-through, and all-NA columns.Parsing pipeline:
Connect$request()now usesjsonlite::fromJSON()instead ofhttr::content(as = "parsed"), giving us control over jsonlite's simplification behaviorparse_connect_rfc3339()uses a vectorized substr-based parser (faster than strptime on large vectors)page_cursor()fetches subsequent pages with simplify=TRUE so jsonlite builds data frames in C, then combines withvctrs::vec_rbind()I started this work while trying to improve performance of
get_usage_static()(see profiling on #501 (comment)). With this branch,get_usage_static()is ~20% faster than on main for 175k records.Checklist
NEWS.md(referencing the connected issue if necessary)?devtools::document()?